Statistics 


OPENING PROBLEM 


Roland owns 2 hotels, one in New York and one in Miami. 
He wants to find out whether there is a difference in the 
number of nights guests stay at the hotels. 


He therefore inspects the last 40 reservations placed for 
each hotel, and records the number of nights the guests 
stayed, 


New York 
2312426345 2445362317 
831342124 5 2343565247 
3623213624 3281731256 
8157218532 4564548137 


Things to think about: 
а What is the best way to organise this data? 
b How can the data be displayed? 
€ What is the most common length of stay at each hotel? 
d How can Roland best measure: 
1 the average length of stay for each hotel Н the spread of each data set? 


€ Can a reliable conclusion be drawn from the data? What factors could affect the reliability of 
the conclusion? 


How could Roland improve the accuracy of his investigation? 


HISTORICAL NOT! E 


Florence Nightingale (1820-1910) was a British nurse in Turkey 
during the Crimean War. She worked in very difficult conditions, 
with overcrowding, poor sanitation, little food, and few basic supplies. 
Nightingale provided a statistical argument for the British government 
to provide improved facilities. By the time the war ended in 1856, 
the hospitals were well-run and efficient, with mortality rates no 
greater than civilian hospitals in England. Nightingale had earned 
an extraordinary reputation, along with the label “the lady with the 
lamp”. 


After returning from the war, Nightingale compiled vast tables of 
statistics about how many soldiers died, where and why, Many of her 
findings shocked her, She discovered that in peacetime, soldiers in 
England died at twice the rate of civi 


Florence Nightingale 


ians, even though they were strong young men. She recognised 


that the problems with the military health service extended far beyond the hospitals during war-time. 
The statistics also made Nightingale realise that poor sanitation had been the principal cause of most. 
of the deaths in Turkey. Work conducted in March 1855 by the Turkish Sanitary Commission led 
to a dramatic decrease in deaths due to disease. However, Nightingale worried that Queen Victoria 
would not properly consider the data presented in the tables, so she found ways to present the data 


in charts, to persuade the Queen of the need for action. 


Nightingale’s best-known chart was 
a variation of a pie graph called 
the polar area diagram. It showed 
the number of deaths each month 
and their causes, Each month is 
represented as a twelfth of a circle. 
Months with more deaths were 
shown with longer wedges, and the 
area of each wedge represented the 
number of deaths in that month 
from wounds, disease, or other 
causes. Nightingale used blue 
wedges to represent disease, red 
wedges for wounds, and black 


wedges for other causes. Using this diagram, Nightingale illustrated the dramatic effect of the Sanitary 
Commission's work in 1855, as the wedges were far smaller in the following months. 


Nightingale's work had a lasting effect. By the end of the century, Army mortality was lower than 
civilian mortality, She wrote, “To understand God's thoughts we must study statistics, for these are 


the measure of his purpose.” 


In statistics we collect and analyse data to give us an understanding of the world around us. 

Most nations conduct a census at regular intervals to gain information about their populations. The United 
Nations gives assistance to developing countries to help them with census procedures, so that accurate 
and comparable worldwide statistics can be collected. 
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ГУ DISCRETE DATA 


A discrete variable takes exact number values, and is often a result of counting. 
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ORGANISATION AND DISPLAY OF DISCRETE DATA 


A tally and frequency table can be used to organise numerical data. 
The data can then be displayed using a column graph or dot plot. 
For the New York hotel data, we have: 


Tally and frequency table Column graph Dot plot 
Jha T Taly TR frequency 
4 10 
8 
6 
4 
2 
0712345678 12 45678 


number of nights number of nights 


DESCRIBING THE DISTRIBUTION OF THE DATA SET 


DESCRIBING THE DISTRIBUTION OF THE DATA SET 


Many data sets show symmetry or partial symmetry about 
the mode, which is the most frequently occurring value. 


If we place a curve over the column graph alongside, we 
see that this curve shows symmetry. We say that we have 
a symmetrical distribution. 


The distribution for the New York hotel data is shown 
alongside. It is said to be positively skewed because, by 
comparison with the symmetrical distribution, it has been 
‘stretched’ on the right or positive side of the mode. 
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mode 


So, we have: 


positive side 


is stretched 
— 


negative side 


is stretched 
— 


symmetrical distribution 


- 


positively skewed distribution 


negatively skewed distribution 


OUTLIERS 


OUTLIERS 


Outliers are data values that are either much larger or much smaller than the general body of data. 
Outliers appear separated from the body of data on a frequency graph. 


For example, in the data set 3, 1, 7, 6, 8, 18, 2, 6, 7, 7, the data value 18 is an outlier. If outliers are 
genuine pieces of data, then they should be included in an analysis of the whole data set. However, if 
outliers occur due to human recording error, they should not be included when the data is analysed. 


GROUPED DISCRETE DATA 


GROUPED DISCRETE DATA 


In situations where there are lots of different numerical values recorded, it may not be practical to use 
an ordinary tally and frequency table, or to display the data using a dot plot or column graph, Instead, 
we group the data into class intervals. 


For example, a local hardware store is studying the number of 
people visiting the store at lunch time. Over 30 consecutive 
weekdays they recorded the data: 

37, 30, 17, 13, 46, 2. 
35, 24, 18, 24, 44. 


40, 28, 38, 24. 
4, 31, 


2, 18, 
38, 41, 38, 


9, 16, 
4, 32. 


In this case, we group the data into class intervals of 
length 10. The tally and frequency table is shown 
alongside, 


We can now use this table to draw a column graph 
for the data. However, we must remember that the 
individual data values are no longer seen. 


EXERCISE 9A 


1 Arandomly selected sample of shoppers was asked, Supermarket shoppers 
“How many times did you shop at a supermarket in. frequency 
" 12) 
the past week?" A column graph was constructed 
for the results. pit 
à How many shoppers gave data in the survey? В 
b How many of the shoppers shopped once or ÊJ] 
twice? 4 
€ What percentage of the shoppers shopped more 2 
than four times? 0 


Е z . 123456 7 8 910 
d Describe the distribution of the data. number of times at the supermarket. 


2 Employees of a company were asked how many times they left the office on business appointments 
during one week. The following dot plot was constructed from the data: 


Business appointments out of the office 


eeeocccooc 
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6 7 8 9 Wn 
number of appointments 


How many employees did not leave the office? 

What percentage of the employees left the office more than 5 times? 
Describe the distribution of the data. 

How would you describe the data value 10°? 


3 20 students were asked “How many TV sets do you have in your household?" The following data 
was collected: 


2 10.3$1421.340072301 101411 
Construct a dot plot to display the data. 


a 
b How would you describe the distribution of the data? Are there any outliers? 
€ How many households had no TV sets? 

d 


What percentage of the households had three or more TV sets? 


а The number of toothpicks in a box is stated as 50, but the actual number 
of toothpicks has been found to vary, To investigate this, the number of 22 
toothpicks in a box was counted for a sample of 60 boxes. The results were: 


50 52 51 50 50 51 52 49 50 48 51 50 47 50 52 48 50 49 51 50 
49 50 52 51 50 50 52 50 53 48 50 51 50 50 49 48 51 49 52 50 
49 49 50 52 50 51 49 52 52 50 49 50 49 51 50 50 51 50 53 48 

а Use a tally and frequency table to organise this data. 

b Display the data using a column graph. 

¢ Describe the distribution of the data. 

d What percentage of the boxes contained exactly 50 toothpicks? 


5 Consider the data for the Miami hotel in the Opening Problem on page 172. 
@ Organise the data in a tally and frequency table. 
b Draw a column graph of the data. 
€ Are there any outliers? 
d Describe the distribution of the data. 
е Compare your column graph with that for the New York hotel on page 173. In which hotel do 
guests generally stay longer? 


6 The data below are the test scores (out of 100) 
for a Science test for 50 students. 


92 29 78 67 68 58 80 89 92 
69 66 56 88 81 70 73 63 

67 64 62 74 56 75 90 56 47 
59 64 89 39 51 87 89 76 59 
72 80 95 68 80 64 53 43 61 


a Construct a tally and frequency table for this data using class intervals 20 - 29, 30 - 39, ...., 
90 - 100. 
b What percentage of the students scored 80 or more for the test? 
€ What percentage of students scored less than 50 for the test? 
d Copy and complete the following: 
More students had a test score in the interval .. 
e Describe the distribution of the data. 


than in any other interval. 


7 A test score out of 60 marks is recorded for a group of 45 students: 


34 37 44 51 53 39 33 58 40 42 43 43 47 37 35 
41 43 48 50 55 44 44 52 54 59 39 31 29 44 57 
45 34 29 27 18 49 41 42 37 42 43 43 45 34 51 


à Organise the data in a tally and frequency table, using the test score ranges 15 - 19, 20 - 24, 
and so on. 

b Draw а column graph for the data. 

Describe the distribution of the data. 


d AnA is awarded to students who scored 50 or more in the test. What percentage of students 
scored an A? 


EXERCISE РА EEE 


1 245 shoppers Ы 18 shoppers 5.6% 
d positively skewed 
2 а 10 employees b =4.44% 
€ positively skewed d t is an outlier, ^ 
3 a Number of TV sets b positively skewed, 
in students” households no outliers 
€ 6 households 


d 15% 


b Number of toothpicks in boxes 
frequency 


47 48 40 50 51 52 53 
no. of toothpicks 


€ approximately symmetrical d = 38.3% 


no 


Miami hotel data 


r frequency 
4 
D 


345678 
number of nights 


9 slightly positively skewed — € the Miami hotel 
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9 More students had а test score in the interval 60 - 69 than in 
any other interval, 
* negatively skewed 


Students? test scores 
"LLLI 


CRETA 
Eg Pa, y Hg, p e o Ug Bey кое 
€ negatively skewed with no outliers d 222% 


